Statistics 1
Why?
Statistics is at the core of data science. It is the logic, rules, & techniques. It is the lens we see and interpret data through it. It provides methods to collect, analyze, interpret, and present data effectively. Understanding statistical concepts is crucial for making data-driven decisions, identifying patterns, and drawing reliable conclusions from data - all essential skills for a data scientist.
What?
This introductory course covers the fundamental concepts of statistics, from basic descriptive methods to probability theory then some inferential techniques. Students will learn how to summarize data, understand probability distributions, estimate parameters, and conduct hypothesis tests, building a strong foundation for more advanced statistical analysis.
Curriculum:
What is Statistics
Introduction to the field of statistics, basic terminology, statistical thinking, types of data, and the role of statistics in the scientific method and data analysis.
Describing Data
Methods for organizing, summarizing, and visualizing data including frequency distributions, histograms, scatter plots, and other graphical representations.
Numerical Measures
Calculating and interpreting measures of central tendency (mean, median, mode) and dispersion (range, variance, standard deviation) to describe data distributions.
Survey of Probability Concepts
Introduction to probability theory, rules of probability, conditional probability, Bayes' theorem, and probability as the foundation for statistical inference.
Discrete Probability Distributions
Understanding and applying discrete random variables and probability distributions including binomial, Poisson, and hypergeometric distributions.
Continuous Probability Distributions
Working with continuous random variables, with a focus on the normal distribution, standard normal distribution, and applications to real-world scenarios.
Sampling Methods
Techniques for selecting samples from populations, including simple random sampling, stratified sampling, cluster sampling, and sampling distributions.
Estimation and Confidence Intervals
Methods for estimating population parameters using point estimates and interval estimates, and understanding confidence levels and margins of error.
One Sample Tests of Hypothesis
Framework for statistical hypothesis testing, null and alternative hypotheses, p-values, type I and type II errors, and conducting z-tests and t-tests.
Notes
Some of the concepts will seem useless practically -and that may be right- however they still play a key role in understanding more complex concepts and proofs in more advanced topics in statistics and machine learning. For example, understanding distributions will help you understand Linear Regression (in Stat 3), and afterwards several Machine Learning models that are based on maximum likelihood estimation.